logo

0.1 Whats its all About?

This is a exhaustive analytics report for getting a clear insights on the video game industry, from its very primitive stage to the peek of the video game industries.the data set is from Kaggle

This report answers Question on Video Game industries some of them are Stated below

0.1.1 Introducing Our Source The Data

The data is in Csv(Comma Separated) format,the dimensions are r< dim(df) >. The names of all the column and their meanings are stated below:-
Atrributes meanings
Rank Rank of video game
Name Name of video game
Platform Platform for which it is developed
Year Year of release
Genre type of game
Publisher Publisher/Developing Company
NA_Sales North Amrica total Sales
EU_Sales Europe total Sales
JP_Sales Japan Sales
Other_Sales Sales in all other Countries
Global_Sales Global total Sales

there are missing data in the csv so we have to clean the data and also tidy the data

0.1.2 Data Wrangling

Data Wrangling is the term collectively given to Data Cleaning And Data Tidying in this process do the following things :-

  • Check data Consistency,duplicates
  • Check for Missing Data
  • Check For Outlines
  • Found a strong reason before removing Outliers
  • Fill the Missing Values
  • Fill the the corrupted Data with proper data
  • Feature Engineering-process of making new Features

Lets get hands on on to this:-

First converting all the character into factor so that we can easily implement Statistics modelling function and also it would be handy to use them in plotting libraries like ggplot2

now we can see that categorical data are interpreted by R, when we look at the data you will see that ‘N/A’ is used for representing NA, if we did not change it R will not recognize it as a Missing value and we get error prone results.

##       Rank           Name              Platform         Year     
##  Min.   :    1   Length:16598       DS     :2163   2009   :1431  
##  1st Qu.: 4151   Class :character   PS2    :2161   2008   :1428  
##  Median : 8300   Mode  :character   PS3    :1329   2010   :1259  
##  Mean   : 8301                      Wii    :1325   2007   :1202  
##  3rd Qu.:12450                      X360   :1265   2011   :1139  
##  Max.   :16600                      PSP    :1213   (Other):9868  
##                                     (Other):7142   NA's   : 271  
##           Genre                             Publisher    
##  Action      :3316   Electronic Arts             : 1351  
##  Sports      :2346   Activision                  :  975  
##  Misc        :1739   Namco Bandai Games          :  932  
##  Role-Playing:1488   Ubisoft                     :  921  
##  Shooter     :1310   Konami Digital Entertainment:  832  
##  Adventure   :1286   (Other)                     :11529  
##  (Other)     :5113   NA's                        :   58  
##     NA_Sales          EU_Sales          JP_Sales         Other_Sales      
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.00000  
##  1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 0.00000  
##  Median : 0.0800   Median : 0.0200   Median : 0.00000   Median : 0.01000  
##  Mean   : 0.2647   Mean   : 0.1467   Mean   : 0.07778   Mean   : 0.04806  
##  3rd Qu.: 0.2400   3rd Qu.: 0.1100   3rd Qu.: 0.04000   3rd Qu.: 0.04000  
##  Max.   :41.4900   Max.   :29.0200   Max.   :10.22000   Max.   :10.57000  
##                                                                           
##   Global_Sales    
##  Min.   : 0.0100  
##  1st Qu.: 0.0600  
##  Median : 0.1700  
##  Mean   : 0.5374  
##  3rd Qu.: 0.4700  
##  Max.   :82.7400  
## 

Now we will check the consistency of the data, weather tha data inside a column is homogeneous or not, or the data inside column is fisible or not.

taking the mean of the differences between the actualSale calculated by summing up Sales from all countries to the Global_Sale Attributes we get

## [1] 0.0002765393

so from here we ca see that the Global_sale atrribute is not correct and has some error init since the value in revenue is in million dollars so there is significant amount which is entered false in the data lets change the value of the Global_sale with the sum of japansale,North America Sale,Europe Sale and others sale

the long tail in the graph clearly states that there are only very few games which have total revenue greater then 75.Most probably these are the most popular game, if not so it may be an outlier.Also we have to check for the duplicacy of the data

## # A tibble: 2,775 x 2
##    Name                         count
##    <chr>                        <int>
##  1 Need for Speed: Most Wanted     12
##  2 FIFA 14                          9
##  3 LEGO Marvel Super Heroes         9
##  4 Madden NFL 07                    9
##  5 Ratatouille                      9
##  6 Angry Birds Star Wars            8
##  7 Cars                             8
##  8 FIFA 15                          8
##  9 FIFA Soccer 13                   8
## 10 Lego Batman 3: Beyond Gotham     8
## # ... with 2,765 more rows

so here we can see that there are 2,775 videogames which are being published more then once, surely these game must have great revenue thats why there are multiple release

in the next section we will analysis the trend and try to find the correlations and give ans to various Curious Questions too.

0.1.3 Univariate

So here we see that how the data is being spread and its central tendencies to get an direct insights of the data

0.1.3.1 Yearly Increase in Videogame Development

## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 
##    9   46   36   17   14   14   21   16   15   17   16   41   43   60  121 
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
##  219  263  289  379  338  349  482  829  775  744  936 1008 1201 1428 1431 
## 2010 2011 2012 2013 2014 2015 2016 2017 2020 
## 1257 1136  655  546  580  614  342    3    1

the hisogram clearly states that there is abrupt declination in video game manufacturing from 2012, this also act as evidance that there are very less jobs for video game developer in 2014. The graph fall abruptly after 2016, this indicates that there must be some kind of problem in data gathering after 2016, the data is inconsistent.We will limit our studies till 2016.

0.1.3.2 Genre wise No. of Game Developed

This graphs shows which, which genre have most no. games in it .Action Games are at the tops followed by Sports, here interesting insights is that, miscellenious game are 3rd highest ranking.

This graphs shows which, which genre have most no. games in it .Action Games are at the tops followed by Sports, here interesting insights is that, miscellenious game are 3rd highest ranking.

0.1.3.3 game Genre distribution on Countries

lets see distribution of companies developing game in a Genre.

0.1.3.4 Sales Country wise Analysis

here we can clearly say that most the sales come from the North America but if we think s point of view of Marketing its not a great metric, as we know that japan has less population so if we incorporate that factor into metric things may be different

0.1.4 Bivariate

Here we will the effects of on variable as the second variable changes an

0.2 Get Some insights

0.2.1 Top 10 Revenue generating Games

todo the leaft work and to mark things to geates and to let things

Name total_sale
Wii Sports 82.74
Grand Theft Auto V 55.92
Super Mario Bros. 45.31
Tetris 35.84
Mario Kart Wii 35.82
Wii Sports Resort 33.00
Pokemon Red/Pokemon Blue 31.37
Call of Duty: Modern Warfare 3 30.83
New Super Mario Bros. 30.01
Call of Duty: Black Ops II 29.72

So till 2016 these games have the most global revenue, Wii sports which is nentando game is on the top and google search link also state this (Quite intresting actually) GrandTheft Auto is on 2 followed by Super Mario

0.2.2 Top 5 Revenue Generating Genres

For a Game developer finding the sweet spot is important to make revenue in such a competative market, lets first find which genre generate maximum revenue and after that we will find that which genre has least compettion that is total revenue devided by total no of video game companies making games on that genre
Genre total_revenue
Action 1722.84
Sports 1309.24
Shooter 1026.20
Role-Playing 923.83
Platform 829.13
Misc 789.87
Racing 726.76
Fighting 444.05
Simulation 389.98
Puzzle 242.21
Adventure 234.59
Strategy 173.27

here we can see that Misc genre is on 3 position in no of count of games but in case revenue genration its far from Action and Sports and all.

0.2.3 Sweet Spot!!

now lets find that which genre has the least no of videogame devloped init

metric would be like to make total revenue generate by a genre devid eby the total no a video game in the genre

Genre total_revenue count ease_metric
Platform 829.13 875 0.9475771
Shooter 1026.20 1282 0.8004680
Role-Playing 923.83 1470 0.6284558
Racing 726.76 1225 0.5932735
Sports 1309.24 2304 0.5682465
Fighting 444.05 836 0.5311603
Action 1722.84 3251 0.5299416
Misc 789.87 1686 0.4684875
Simulation 389.98 848 0.4598821
Puzzle 242.21 570 0.4249298
Strategy 173.27 670 0.2586119
Adventure 234.59 1274 0.1841366

some intresting facts again as we can see that platorm and Shotter game are genrating great average revenue per game, action game clearly get out of top 5 and its clearly show that there are alot number of games,